Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: Seaware Date: Oct 3, 2012 9:49pm
Forum: petabox Subject: How long does the data last?

I am interested to learn if there are estimates on how many years the data would last before it needs to be replicated to a new drive. Also, is ECC used so if there is some data lost in a period, that data is likely to still be recoverable? It would be interesting to know if we would consider if this archive is really an archive from the perspective of 1000 years from now.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffCoderjoe Date: Oct 8, 2012 1:49pm
Forum: petabox Subject: Re: How long does the data last?

The data is stored on two separate hardware nodes as soon as it is uploaded to archive.org. As far as I know, the system does not do extra ECC (beyond what the hard drive does internally). However, in one of the item's xml files, it stores a list of files for the item along with checksums, which can be used to verify the files on each node.

Reply to this post
Reply [edit]

Poster: Seaware Date: Oct 9, 2012 12:47am
Forum: petabox Subject: Re: How long does the data last?

Thanks. So if the half life of the data on the disk is 100 years (for example) would the drive be powered on and data be checked at least once during that period and the first failing checksum cause a replication to a fresh drive? Also, I hope you are using a CRC, not a pure checksum, which will be more likely to find multi-bit errors.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffCoderjoe Date: Oct 10, 2012 11:04pm
Forum: petabox Subject: Re: How long does the data last?

I don't know low-level details, so I don't know if the data is scrubbed regularly. I also don't know the procedures that occur when a drive fails and needs to be replaced.

Currently, looking at the files.xml file for a random item, the system does sha1, md5, and crc32. It also stores the file size and mtime.